Optimal objective function in high-dimensional regression

نویسندگان

  • Derek Bean
  • Peter Bickel
  • Noureddine El Karoui
چکیده

In this article we study a fundamental statistical problem: how to optimally pick the objective to be minimized in a parametric regression when we have information about the error distribution. The classical answer to the problem we posed at the beginning, maximum likelihood, was given by Fisher (5) in the specific case of multinomial models and then at succeeding levels of generality by Cramér (3), Hájek (7) and above all Le Cam (11). For instance, for p fixed or p/n → 0 fast enough, least squares is optimal for Gaussian errors while LAD is optimal for double exponential errors. We shall show that this is no longer true in the regime we consider with the answer depending, in general, on the limit of the ratio p/n as well as the form of the error distribution. Our analysis in this paper is carried out in the setting of Gaussian predictors, though as we explain below, this assumption should be relaxable to a situation where the distribution of the predictors satisfy certain concentration properties for quadratic forms. We carry out our analysis in a regime which has been essentially unexplored, namely 0 p/n < 1 where p is the number of predictor variables and n is the number of independent observations. Since in most fields of application, situations where p as well as n is large have become paramount, there has been a huge amount of literature on the case where p/n 0 but the number of “relevant” predictors is small. In this case the objective function, quadratic (least squares) or otherwise (`1 for LAD) has been modified to include a penalty (usually `1) on the regression coefficients which forces sparsity ((1)). The price paid for this modification is that estimates of individual coefficients are seriously biased and statistical inference, as opposed to prediction, often becomes problematic. In (4), we showed 1 that this price need not be paid if p/n stays bounded away from 1. We review the main theoretical results from this previous paper in Result 1 below. From a practical standpoint, some of our key findings were:

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

IMPROVED BIG BANG-BIG CRUNCH ALGORITHM FOR OPTIMAL DIMENSIONAL DESIGN OF STRUCTURAL WALLS SYSTEM

Among the different lateral force resisting systems, shear walls are of appropriate stiffness and hence are extensively employed in the design of high-rise structures. The architectural concerns regarding the safety of these structures have further widened the application of coupled shear walls. The present study investigated the optimal dimensional design of coupled shear walls based on the im...

متن کامل

An integrated fuzzy multiple objective decision framework to optimal fulfillment of engineering characteristics in quality function development

Quality function development (QFD) is a planning tools used to fulfill customer expectation and QFD is a systematic process to translating customer requirement (WHATs) into technical description (HOWs). QFD aims to maximize customer satisfactions related to enterprise satisfaction. The inherent fuzziness of relationships in QFD modeling justifies the use of fuzzy regression for estimating the r...

متن کامل

An integrated fuzzy multiple objective decision framework to optimal fulfillment of engineering characteristics in quality function development

Quality function development (QFD) is a planning tools used to fulfill customer expectation and QFD is a systematic process to translating customer requirement (WHATs) into technical description (HOWs). QFD aims to maximize customer satisfactions related to enterprise satisfaction. The inherent fuzziness of relationships in QFD modeling justifies the use of fuzzy regression for estimating the r...

متن کامل

Optimal M-estimation in high-dimensional regression.

We consider, in the modern setting of high-dimensional statistics, the classic problem of optimizing the objective function in regression using M-estimates when the error distribution is assumed to be known. We propose an algorithm to compute this optimal objective function that takes into account the dimensionality of the problem. Although optimality is achieved under assumptions on the design...

متن کامل

A ridgelet kernel regression model using genetic algorithm

In this paper, a ridgelet kernel regression model is proposed for approximation of high dimensional functions. It is based on ridgelet theory, kernel and regularization technology from which we can deduce a regularized kernel regression form. Taking the objective function solved by quadratic programming to define the fitness function, we use genetic algorithm to search for the optimal direction...

متن کامل

Optimal Pareto Parametric Analysis of Two Dimensional Steady-State Heat Conduction Problems by MLPG Method

Numerical solutions obtained by the Meshless Local Petrov-Galerkin (MLPG) method are presented for two dimensional steady-state heat conduction problems. The MLPG method is a truly meshless approach, and neither the nodal connectivity nor the background mesh is required for solving the initial-boundary-value problem. The penalty method is adopted to efficiently enforce the essential boundary co...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012